Bllip: An Improved Evaluation Metric for Machine Translation

نویسندگان

Michael Pozar

Eugene Charniak

چکیده

In this paper we present a new automatic scoring method for machine translations. Like the now-traditional BLEU score it maps a proposed translation and a set of reference translations to a real number. This number is intended to reflect the quality of the proposed translation. We present some experiments that indicate that this new metric, the Bllip score (Brown Laboratory for Linguistic Information Processing) correlates better with human judgment than does BLEU.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Meteor 1.3: Automatic Metric for Reliable Optimization and Evaluation of Machine Translation Systems

This paper describes Meteor 1.3, our submission to the 2011 EMNLP Workshop on Statistical Machine Translation automatic evaluation metric tasks. New metric features include improved text normalization, higher-precision paraphrase matching, and discrimination between content and function words. We include Ranking and Adequacy versions of the metric shown to have high correlation with human judgm...

متن کامل

METEOR: An Automatic Metric for MT Evaluation with High Levels of Correlation with Human Judgments

Meteor is an automatic metric for Machine Translation evaluation which has been demonstrated to have high levels of correlation with human judgments of translation quality, significantly outperforming the more commonly used Bleu metric. It is one of several automatic metrics used in this year’s shared task within the ACL WMT-07 workshop. This paper recaps the technical details underlying the me...

متن کامل

Automatic Evaluation Measures for Statistical Machine Translation System Optimization

Evaluation of machine translation (MT) output is a challenging task. In most cases, there is no single correct translation. In the extreme case, two translations of the same input can have completely different words and sentence structure while still both being perfectly valid. Large projects and competitions for MT research raised the need for reliable and efficient evaluation of MT systems. F...

متن کامل

Re-evaluation the Role of Bleu in Machine Translation Research

We argue that the machine translation community is overly reliant on the Bleu machine translation evaluation metric. We show that an improved Bleu score is neither necessary nor sufficient for achieving an actual improvement in translation quality, and give two significant counterexamples to Bleu’s correlation with human judgments of quality. This offers new potential for research which was pre...

متن کامل

Re-evaluating the Role of Bleu in Machine Translation Research

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2006

Bllip: An Improved Evaluation Metric for Machine Translation

نویسندگان

چکیده

منابع مشابه

Meteor 1.3: Automatic Metric for Reliable Optimization and Evaluation of Machine Translation Systems

METEOR: An Automatic Metric for MT Evaluation with High Levels of Correlation with Human Judgments

Automatic Evaluation Measures for Statistical Machine Translation System Optimization

Re-evaluation the Role of Bleu in Machine Translation Research

Re-evaluating the Role of Bleu in Machine Translation Research

عنوان ژورنال:

اشتراک گذاری